Cluster Evaluation of Density Based Subspace Clustering

نویسندگان

  • Rahmat Widia Sembiring
  • Jasni Mohamad Zain
چکیده

Clustering real world data often faced with curse of dimensionality, where real world data often consist of many dimensions. Multidimensional data clustering evaluation can be done through a density-based approach. Density approaches based on the paradigm introduced by DBSCAN clustering. In this approach, density of each object neighbours with MinPoints will be calculated. Cluster change will occur in accordance with changes in density of each object neighbours. The neighbours of each object typically determined using a distance function, for example the Euclidean distance. In this paper SUBCLU, FIRES and INSCY methods will be applied to clustering 6x1595 dimension synthetic datasets. IO Entropy, F1 Measure, coverage, accurate and time consumption used as evaluation performance parameters. Evaluation results showed SUBCLU method requires considerable time to process subspace clustering; however, its value coverage is better. Meanwhile INSCY method is better for accuracy comparing with two other methods, although consequence time calculation was longer. Index Terms — clustering, density, subspace clustering, SUBCLU, FIRES, INSCY. —————————— —————————— 1 DATA MINING AND CLUSTERING Data mining is the process of extracting the data from large databases, used as technology to generate the required information. Data mining methods can be used to predict future data trends, estimate its scope, and can be used as a reliable basis in the decision making process. Functions of data mining are association, correlation, prediction, clustering, classification, analysis, trends, outliers and deviation analysis, and similarity and dissimilarity analysis. One of frequently used data mining method to find patterns or groupings of data is clustering. Clustering is the division of data into objects that have similarities. Showing the data into smaller clusters to make the data becomes much simpler, however, can also be loss of important piece of data, therefore the cluster needs to be analyzed and evaluated. This paper organized into a few sections. Section 2 will present cluster analysis. Section 3 presents density-based clustering, followed by density -based subspace clustering in Section 4. Our proposed experiment based on performance evaluation discussed in Section 5, followed by concluding remarks in Section 6. 2 CLUSTER ANALYSIS Cluster analysis is a quite popular method of discretizing the data [1]. Cluster analysis performed with multivariate statistics, identifies objects that have similarities and separate from the other object, so the variation between objects in a group smaller than the variation with objects in other groups. Cluster analysis consists of several stages, beginning with the separation of objects into a cluster or group, followed by appropriate to interpret each characteristic value contained within their objects, and labelled of each group. The next stage is to validate the results of the cluster, using discriminant function. 3 DENSITY BASED CLUSTERING Density-based clustering method calculating the distance to the nearest neighbour object, object measured with the objects of the local neighbourhood, if inter-object close relative with its neighbour said as normal object, and vice versa. In density-based cluster, there are two points to be concerned; first is density-reachable, where p point is density-reachable from point q with Eps, MinPoints if there are rows of points p1, ..., pn, p1 = q, pn = p , such that pi+1 directly density-reachable from pi as shown in Figure-1. ———————————————— • Rahmat Widia Sembiring, is with Faculty of Computer Systems and Software Engineering Universiti Malaysia Pahang, Lebuhraya Tun Razak, 26300, Kuantan, Pahang Darul Makmur, Malaysia, E-mail : [email protected] • Jasni Mohamad Zain, is Associate Professor at Faculty of Computer Systems and Software Engineering Universiti Malaysia Pahang, Lebuhraya Tun Razak, 26300, Kuantan, Pahang Darul Makmur, Malaysia, E-mail : [email protected] © 2010 Journal of Computing Press, NY, USA, ISSN 2151-9617 http://sites.google.com/site/journalofcomputing/ JOURNAL OF COMPUTING, VOLUME 2, ISSUE 11, NOVEMBER 2010, ISSN 2151 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG Figure-1 Density Reachable The second point is density-connected, where point as density-connected at the point q with Eps there is a point o such that p and q density-reachable from o with Eps and MinPoints, as shown in Figure Figure-2 Density Connected Density-based clustering algorithm will classify o jects based on object-specific functions. The most popular algorithm is DBSCAN [2]. In DBSCAN, the data do not have enough distance to form a cluster, referred as an outlier, will be eliminated. DBSCAN will determine for themselves the number of clusters generated after incl sion of input Eps (maximum radius of neighbo point) and MinPoints (minimum number of points which is in an environment Eps), expressed in pseudocode rithms such as Figure-3 [Wikipedia]. DBSCAN(D, eps, MinPts) C = 0 for each unvisited point P in dataset D mark P as visited N = getNeighbours (P, eps) if sizeof(N) < MinPts mark P as NOISE else C = next cluster expandCluster(P, N, C, eps, MinPts) expandCluster(P, N, C, eps, MinPts) add P to cluster C for each point P' in N if P' is not visited mark P' as visited N' = getNeighbours(P', eps) if sizeof(N') >= MinPts N = N joined with N' if P' is not yet member of any cluster add P' to cluster C Figure-3 DBSCAN Algorithm -9617

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient and Fast Density Conscious Subspace Clustering using Affinity Propagation

Subspace clustering is an eminent task to detect the clusters in subspaces. Density-based approaches assume the high-density region in the subspace as a cluster, but it creates density divergence problem. The proposed work improves the performance of Density Conscious subspace clustering (DENCOS) by utilizing the Affinity Propagation (AP) algorithm to detect the local densities for a dataset. I...

متن کامل

ISC–Intelligent Subspace Clustering, A Density Based Clustering Approach for High Dimensional Dataset

Many real-world data sets consist of a very high dimensional feature space. Most clustering techniques use the distance or similarity between objects as a measure to build clusters. But in high dimensional spaces, distances between points become relatively uniform. In such cases, density based approaches may give better results. Subspace Clustering algorithms automatically identify lower dimens...

متن کامل

An Efficient Density Conscious Subspace Clustering Method using Top-down and Bottom-up Strategies

Clustering high dimensional data is an emerging research field. Most clustering technique use distance measures to build clusters. In high dimensional spaces, traditional clustering algorithms suffers from a problem called “curse of dimensionality”. Subspace clustering groups similar objects embedded in subspace of full space. Recent approaches attempt to find clusters embedded in subspace of h...

متن کامل

Density Conscious Subspace Clustering for High Dimensional Data using Genetic Algorithms

Clustering has been recognized as an important and valuable capability in the data mining field. Instead of finding clusters in the full feature space, subspace clustering is an emergent task which aims at detecting clusters embedded in subspaces. Most of previous works in the literature are density-based approaches, where a cluster is regarded as a high-density region in a subspace. However, t...

متن کامل

Density Conscious Subspace Clustering for High Dimensional Data using Genetic Algorithms

Clustering has been recognized as an important and valuable capability in the data mining field. Instead of finding clusters in the full feature space, subspace clustering is an emergent task which aims at detecting clusters embedded in subspaces. Most of previous works in the literature are density-based approaches, where a cluster is regarded as a high-density region in a subspace. However, t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1012.6009  شماره 

صفحات  -

تاریخ انتشار 2010